Importing the neccessary libraries and data

Import the data

Data Overview

Automated EDA

Transform Object Data into Categorical Data

Conduct Manual EDA

New and Used Price appear to contain a significant amount of outliers

Identify Outliers

Removing Outliers from the DataFrame

Identifying Missing Values

Transforming Missing Values to the Median

Conducting Additional EDA

Transforming Categorical Data to Dummy Variables

Loading more libraries

Splitting the Data Set

Model Building

Fitting the model to the training set

Get the score on training set

Get the score on test set

Get the RMSE on train set

Get the RMSE on test set

Get the model coefficients

Automate the equation of the fit

Model Building (statsmodels)

Make the linear model using statsmodels OLS and print the model summary.

Get the value of the coefficient of determination.

Automate the equation of fit

7 Variables: screen size, battery, weight, Google brand, OnePlus brand, Splice brand, and XOLO brand affected the regression line of best fit dramatically. Including these variables cause the prediction of price to be less accurate based on the dataset provided.